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1. Introduction: Serfling’s finite sampling exponential bound 

Suppose that {ci,..., cn} is a finite population with each Ci G K. For n < iV, let Yi,..., be a sample 
drawn from {ci,..., cat} without replacement; we can regard the finite population {ci,..., cat} as an urn 
containing N balls labeled with the numbers ci,..., cat- Some notation: we let 

N N 

= N~^^Ci = CN, cr^ = -Cat)^, 

aN = niin Ci, = max q, 

l<i<N l<i<N 

It is well-known (see e.g. Rice [2007], Theorem B, page 208) that Y„ = n~^ Er=i satisfies E(Y„) = ^at 
and 


Var{Yn) 


a% r n-l \ 
n \ — 1 / 



n 


(1 - fn). 


( 1 . 1 ) 
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Serfling [1974], Corollary 1.1, shows that for all A > 0 

_ / 2A^ a 

( 1 , 2 ) 

This inequality is an inequality of the type proved by Hoeffding [1963] for sampling with replacement and 
more generally for sums of independent bounded random variables. Comparing (1.1) and (1.2), it seems 
reasonable to ask whether the factor /* in (1.2) can be improved to fn = {n — 1)/{N — 1)? Indeed Serfling 
ends his paper (on page 47) with the remark: “(it is) also of interest to obtain (1.2) with the usual sampling 
fraction instead of /*”. Note that when n = N, Yn = IJ-n, and hence the probability in (1.2) is 0 for all 
A > 0, and the conjectured improvement of Serfling’s bound agrees with this while Serfling’s bound itself is 
positive when n = N. 

Despite related results due to Kemperman [1973a,b,c], it seems that a definitive answer to this question 
is not yet known. 

A special case of considerable importance is the case when the numbers on the balls in the urn are all Ts 
and O’s: suppose that ci = • • • = co = 1, while cd+i, ..., cat = 0. Then X = nYn = is well-known 

to have a Hypergeometric(n, D, TV) distribution given by 

Y, = k 

\i=l 

In this special case /tat = D/N, a% = /iAr(l — ^'n), while = 1 and oat = 0. Thus Serfling’s inequality 
(1.2) becomes 

— / 2A^ A 

P(V«(y„ - ^at) > A) < exp for all A > 0, 

and the conjectured improvement is 

— / 2A^ A 

P{\/n{Yn — iJ-N) > A) < exp ( — --— j for all A > 0. 

V -L .In J 




\k 


)(Z-J) 


o 


max{0, D + n — TV} < k < min{n, D}. 


Despite related results due to Chvatal [1979] and Hush and Scovel [2005] it seems that a bound of the form 
in the last display remains unknown. 

We should note that an exponential bound of the Bennett type for the tails of the hypergeometric 
distribution does follow from results of Vatutin and Mikhailov [1982] and Ehm [1991]; see also Pitman 
[1997]. 


Theorem 1. (Ehm, 1991) If I < n < D A {N — D), then ^ 'where Xi ^ Bernoulli{'Ki), 

with TTi G (0,1), are independent. 


It follows from Theorem 1 that 

/ n \ ( \ ^ 

n{D/N)=EiY,YA =EiY,xA =^ 7 r„ 


i=l 


D 


D 


N 




n 

^7r,(l-7r,). 

i=l 


Furthermore, by applying Theorem 1 together with Bennett’s inequality (Bennett [1962]; see also Shorack 
and Wellner [1986], page 851), we obtain the following exponential bound for the tail of the hypergeometric 
distribution: 
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Corollary 1. If 1 < n < D A {N — D), then for all X > 0 

P(Vi(Yn - ») > A) < exp («ii-/7))) 

where /i^r = D/N, cr^ = ^Ar(l — I — fn = I — {n — — 1) is the finite-sampling correction factor, 

and if{y) = 2y~‘^h{l y) where h{y) = y{logy — 1) + 1. 

Since af; = ^Ar(l — ^n) < 1/4, the inequality of the corollary yields a further bound which is quite 
close to the conjectured Hoeffding type improvement of Serfling’s bound, and which now has the desired 
finite-sampling correction factor 1 — 

Corollary 2. 

P{Vn{Yn- tiN)> X) < exp . . i’ ( r- 2 \ 

- 

By considerations related to the work of Talagrand [1994] and Leon and Perron [2003], the first author of 
this paper has succeeded in proving the following exponential bound. 


Theorem 2. (Greene, 2014) Suppose that ~ Hypergeometric(n,D,N). Define fiN 

suppose TV > 4 and 2 < n < D < N/2. Then for all 0 < X < \/n/2 we have 


P {y/n{Yn - P-n) > A) 

< 



n-|-2AA f N — n 2yfnX 
n — 2xJ \N — n — 2y/nX 


_1 / \ A^ 

3 \ ^ {)N — n)^ J n 


D /N and 


The proof of this bound, along with a complete analogue for the hypergeometric distribution of a bound of 
Talagrand (1994) for the binomial distribution, appears in Greene and Wellner [2015] and in the forthcoming 
Ph.D. thesis of the first author, Greene [2016]. 

The bound given in Theorem 2 involves a still better finite-sampling correction factor, namely 1 — /„ = 
1 — n/N, which has also appeared in Lo [1986] in the context of a Bayesian analysis of finite sampling. Note 
that as fV —>• 00 , the above bound yields 


limsupP (\/n(A"n — Pn) > A) 

N—>-oo 


< 


27rA^ 


yYi 2X 
\/n — 2X 


exp —2A^ — 


3n 


a bound which improves slightly on the bound given by Leon and Perron [2003] in the case of sums of i.i.d. 
Bernoulli random variables. 

Before leaving this section we begin to make a connection to finite-sampling empirical distributions: Now 


let F„(t) = n ^Y)h=i l(-oo,t] and PNft) = N ^ l(-oo, t](ci)- Then it is easily seen that Serfling’s 

bound yields 


P{y/n{¥n{t) - FN{t)) > A) < exp ( - 


2A2 


(l-(n-l)/iV) 
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for each fixed A > 0 and t gM.. Note that since F„(t) is equal in distribution to the sample mean of n draws 
without replacement from an urn containing NF]sf{t) I’s and N{1 — Fisf{t)) O’s, the bound in the last display 
only involves the hypergeometric special case of Serfling’s inequality. This leads to the following conjecture 
concerning bounds for the finite sampling empirical process {•yn(F„(t) — F/v(t)) : t gR}: 

Conjecture: There exist constants C,D > 0 (possibly (7=1 and D = 2?) such that 

P ^A/nsup(F„(t) - FN{t)) > < (7exp 

P ^v^sup |F„(t) - > A^ < Pexp (1-4) 

for all A > 0. The possibility that P = 2 is suggested by the corresponding inequality established by Massart 
[1990] in the case of sampling with replacement. 

With these strong indications of the plausibility of an improvement of Serfling’s bound and corresponding 
improvements in exponential bounds for the uniform-norm deviations of the finite-sampling empirical process, 
we can now turn to an application of the basic idea in the context of two-sample Kolmogorov-Smirnov 
statistics. 

2. Two-sample tests and finite-sampling connections 


To connect this with the two-sample Kolmogorov-Smirnov statistics, suppose that Xi,..., X^. are i.i.d. 
F and Yi,...,F„ are i.i.d. (7. Let N = m + n. Then for testing He ■ F = G with F continuous versus 
: F > G {F G), K~ : G > F, {G F), oi K : F ^ G, the classical K-S test statistics are 


P+ = 

m,n 


N 

mn 


sup(Fm(a;) - <G„(x)), 

X 

sup(G„(a;) - Fm(a::)), and 


Pm,n — 'Y ^ SUp|F7,.j(x) 

respectively. It is well-known that under we have 

Fm,n^d sup U(t), Dm,n^d SUp |U(t)| 


0<t<l 


0<t<l 


if wAn —>■ oo where U is a standard Brownian bridge process on [0, Ij; see e.g. Hajek and Sidak [1967], pages 
189-190, Hodges [1958], and van der Vaart and Wellner [1996], pages 360-366. 

Note that with \m = m/N and 


N 

Hat = AatFto -f (1 — AAr)G„ = N~^ ^ l(_oo,.](^(q) 

i=l 

where ^i) < • • ■ < ^{n) are the order statistics of the pooled sample, we have 

Fm—EIw = Fm — AatFto — (1 — AAr)<G„ = (1 — AAr)(Fm — G„), and 

— Hjv = G„ — AatFto — (1 — AAr)G„ = AAr(G„ — Fm), 
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and hence, with Aat = 1 — A at, 


\ — G„) = yfnj XpfXN^iVm — Etw) = — ^n), 

V Atv vAat 

— Em) = \/ Aat Aat (G„ — Hat) = --j==\/n{Gn — ElAr)- 

Thus, using the independence of the ranks R and the order statistics Z 

Pi^m,n—^^ ~ ^zPr (^\/m\\(¥m —^n)^ Woo > 

and it would follow from (1.3) that 

P{D+^^>t) < C'exp(-2AArtV(l-/m)) 

< Cexp (—2(n/iV)t^/(n/(A^ — 1))) 


= Cexp —2 


N-l 


N 


-r 


( 2 . 1 ) 


for all t > 0. Similarly it would also follow from (1.3) that 
P{D-^r,>t) < Cexp(-2A^tV(l-/n)) 

< C exp i^—2{rn/N)t^/{m/{N — 1))) = C exp ^—2 "^^ ^ 

for all t > 0. Combining the two one-sided inequalities yields a (conjectured) two-sided inequality: 

P{Dm,n^^) = P{Vmn/N\\¥m - <G„||oo > t) 

< P{D+,u>t)+PiDm,u>t) 

,7V- 1 


< 2C exp —2 


N 


In the next section we will prove that bounds of this type with (7=1 and D = 2 hold in the special case 
m = n. For some results for the two-side two-sample Kolmogorov-Smirnov statistic in the case m = n and 
computational results for m ^ n, see Wei and Dudley [2012]. These authors were aiming for a bound of the 
form (7exp(—2t^) both for m = n and m ^ n. The above heuristics seem to suggest that a bound of the 
form C exp(—2((7V — l)/7V)t^) might be a natural goal. 


3. An exponential bound for £)+ ^ when m — n 


Throughout this section we suppose that the null hypothesis He holds: G = F is a, continuous distribution 
function. 

From Hodges [1958], (2.3) on page 473 (together with t = ^Jmn/Nd and d = a/n from page 473, line 4), 
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when m = n (so N = 2n), 


PiDln > t) 




We first compare the exact probability from the last display with the possible upper bounds 

2n — 1 


PB 2 {n) = exp —2 


2n 2n 


PBsin) = exp ( -2 — 


For n = 3 we find that 


a 

0 

1 

2 

3 

E{xact) 

1 

.75 

0.3 

0.05 

PB2 

1 

0.7574 

0.3291 

0.0821 

PB2-E 

0 

0.0074 

0.0291 

0.0321 

PB3 

1 

0.7165 

0.2636 

0.0498 

PB3-E 

0 

-0.0335 

-0.0364 

-0.0002 


Further comparisons for m = n = 10,12,13,14,15,25 support the validity of the bound involving the finite 
sampling fraction /„. These comparisons agree with the following theorem: 


Theorem 3. A. When m = n (so that N = 2n) the second bound in (2.1) holds for all n > 1 with C = 1; 


= P[\l-^sMVm{x)-Gn{x))>t 
< exp ( —2 ^^ ^ t^ ) for all t > 0. 


Equivalently, when m = n, 


P sup(F„(a;) - (G„(a;)) > < exp (-2t^) 


for all t > 0. 

B. On the other hand, when m = n (so that N = 2n), for all n > 1 we have 

P{Pn,n ^ t) > exp(—2t^) for all 0 < t < 1. 


(3.1) 

(3.2) 


(3.3) 


Proof. A. Since the inequality holds trivially for a = 0, and can be shown easily by numerical computation 
for a G {1,2,3} (see the Table above), it suffices to show that 



< exp 


( ^ 2n-l a^\ 
\ 2n 2n) 
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for a S {1,... ,n} and n > 4. Furthermore, we will show that it holds for a = n in a separate argument, 
and thus it suffices to show that it holds for a € {1,..., n — 1} and n > 4. By rewriting the numerator and 
denominator on the left side of the last display, the desired inequality can be rewritten as 


i\n.\ 


(n — a)!(n + a)! 

By taking logarithms we can rewrite this as 

, nln\ 

log 


< exp — 


(n — a)!(?T. + a)! 


2n — 1 
2 n n 


2n — 1 

H- -z -<0. 

In n 


Now by Stirling’s formula with bounds (see e.g. Nanjundiah [1959]) we have 






exp 


(j_^ 

V 12k 360A:3 


< k\ < ^27Tk { — 


exp 


Vl2A: 


Using these bounds in (3.4) we find that the left side is bounded above by 

-i(i„g(i-^)+i„g(i+:[)) 


1 




n/ 

1 


1 


+ 


1 


a a 
n 2 n? 

= /i + /q + 4 


6 n 12(n — a) 12(n + a) 360 \{n — a)^ (n + a)^ 

2 „2 


1 


+ 


1 


L. 


(3.4) 


(3.5) 


Note that Ii and I 2 are as defined in Wei and Dudley [2012] page 640, while I 3 and I 4 differ. From Wei and 
Dudley [2012] page 640, 


_ 

^ ~ n 15n^ 28n^ ’ 


(3.6) 


(which is proved by Taylor expansion of (1 + x) log(l + a;) + (1 — x) log(l — x) about x = 0), and 


I 2 < 


2n2 4n^ 6n®(l — a^jn?) 


(3.7) 


Note that the lead term in the bound (3.6) for Ii and lead term of I 4 cancel each other, while the first term 
of the bound (3.7) for I 2 cancels the second term of I 4 . Adding the bounds yields 

h + I2 + I3 + I4 

0,4 „4 „6 „8 

< - 


12n^ 12n^ 15n® 28n’^ 


+ 


4n4 6n®(l — d?- jv?) 


1 


a4 a6 


1 

v? 

12 da) 

' 12n^ n® ' 

\ 15 6n(l — d?' jrd)) 

' 28n7 
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< - 


a4 


1 

rY 

[l 2 

4n 

a^ 


1 

rY 

[l 2 

4n 

a^ 


1 

rY 

[l 2 

4n 

a* 

n 

1 

rY 

V12 

4n 

12 + l3- 



1 


1 


12n3 \15 6(2-1/n) 


28n7 


+ h 


3o® a® 
12n3 105n5 ~ 2§n7 


+ h 


a 




12n3 

a4 


1 - 


36a^ 


105n2 J 28n7 

+ h 


+ 13 


21n3 28n7 


Now i?i 2 < 0 for n > 4 and I 3 < 0 for all n > 2 and a G {1,..., n — 1} by the following argument: 


13 = - - 


1 


1 


+ 


1 


1 


6n 12(n — o) 12(n + a) 360 \(n + a)® (n — a)® 
1 2 n(n^ + 3a^) 

6 n{n? — a^) 360 (n^ — 


1 


< - 


Qn{v? — a?) 
1 

Qn{v? — a^) 

1 

Qn{v? — a^) 

1 

Qn{'n? — a^) 


a^- 


a^- 


2 n^(n^ + 3a^) 

60 (n^ — a?Y 
1 n^(n^ — + 4a^) 


a" 1- 


( 1 


(n^ — a^Y 

2 \ rYijY — a?) 

15 (n^ —0^)2/ 30(n2 — 0^)2 

-in. 


15 3/ 30(n2 - a?) 

by using a < n — 1, so rY' — > rY — {n — lY = (2n — 1), 

/(2n—l)^<l/3 for n > 4, 


and 


1 


< - 


< - 


Qn{rY — aY) 

1 

Qn{rY — a2) 
1 

Qn{rY — a^) 
1 

Qn{rY — a2) 


( l- 

a2 ( 1- 


3-15 

2 


rY — oY + aY 
30(n2 — aY) 
1 


a" 1- 


3 • 15 30(n2 - a2) 

2 1 


a" 1- 


3-15 

30(2n 

31 \ 

M 

630 J 

30 j 


1 

' ^ 
1 

30 


for n > 4. This is a decreasing function of a for fixed n, and hence to show that it is < 0 it suffices to check 
it for a = 1. But when a = 1 the right side above equals 

_^_ |l-^-l 

6n(n2 - 12) \ 630 30 

1 r 289 1 1 / 280 1 _ 4 

n{rY — 1) ( 6 • 315 j ^ n{rY — 1) \ 6 • 315 j 27n{rY — 1) ^ 


so we conclude that I 3 < 0 for a € {1,..., n — 1} and n > 4. It remains only to show that the desired bound 
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holds for a = n] that is we have 


< exp(-(n- 1/2)). 


But this can easily be shown via the Stirling formula bounds (3.5). 
Thus 


exp(/i + I 2 + h) < exp(-/ 4 ) = exp 


/ 2 n — 1 a^\ 

\ 2 n n ) ' 


and the claimed inequality holds for all n > 4. Since the bounds hold for n = 1,2,3 by direct numerical 
computation, the claim follows. 

B. We first define 


r„(a) = log 

= log 


2n \ j /2n\ 
n—aJ ' \ n / 


exp(—2a^/ (2n)) 
2 n 

n — a 


- log 


2 n 

n 


a 

n 


Since we can take t = ajV^, it suffices to show that r„(a) > 0 for 1 < a < We will first show this 

for n > 31. Then the proof will be completed by checking the inequality numerically for 1 < a < and 

n e {1,...,30}. 

By using the Stirling formula bounds of (3.5) as in the proof of A, but now with upper bounds replaced 
by lower bounds, we find that 

rn{a) = 21og(n!) — log(ri — a)! — log(n + a)! H- 


> —n 




Ill 1 

6n 180n^ 12(n — a) 12{n + a) 


a 

n 


= Li + L2 + + L4. 

As in (3.6) and (3.7) and the displays following them we find that 


L 2 

Lz 

u 


> 


—n < 

[ 4 + 


^2 

^4 

6 n‘^ 

a 

a 

a 



6n® 





6n(n^ — a?) 180n^’ 


a 

n 
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Putting these pieces together and rearranging we find that 


^(a) > 




64n2 


+ 


a a 
4n^ 6 n® 

1 1 
6 n 


a 

6 n^ 


15n® 

1 


28n5 


64n2 
=:Ki+K2>0 


180n^ 12 (n + a) 12 (n — a) 


(3.8) 

(3.9) 


will prove the claim. Note in (3.8) that the a^/n term cancelled by virtue of the lower bound estimate based 
on the Taylor expansion of (1 + x) log(l + x) + (1 — a;) log(l — x). First note that 

1 1 1 1 
64n^ 6 n 180n3 12 (n + a) 12(n — a) 

a^[28n^ — 45a^n] + a^[16n^ — 480n^] + [o^rF' + 16a^ — 16n^] 

2880n3(n — a){n + a) 



The denominator of the right-hand-side is clearly positive for a G {l, 2,..., }. By inspection, we 

can see the term + 16a^ — 16n^ in the numerator is increasing in a. Picking a = 1, we then see 
-I- 16 — 16n^ > 0 for n > 31, and thus -I- 16a^ — 16n^ > 0 for all admissible a. Next, the polynomial 
28n3 - 450^ n is decreasing in the admissible a. For any fixed u, the minimum value it can attain is then 
larger than 28n^ — 90n^. For n > 31, this quantity is positive. Therefore, 28u^ — 45a^n > 0 for all admissible 
a when n > 31. Finally, note that 16n^ — 480n^ = 16n^(n —30) > 0 for n > 31. Hence we have shown K 2 > 0. 
We next have 


Ki 


31 a^ 

1 

-h 

r 

1 

-h 

'06 a® / 1 Y 

6477 ^ 

6773 

4 n ^ 

15773 

677® 28773 y 77 ^ — 


,2 \ 


Vl92n3 J 


(93n — 32a^ 


60n^ 


(15n — 4a^) 


/ a® 
\ 8471 ® {n? 



(1471^ - Sa^Ti - 140^) 


[(a) (9377 - 320^)] -h [(/3) (ISn - 40^)] 
-I- [( 7 ) (Mti^ — 3 a ^77 — 14a^)] . 


(3.10) 


Again since a G {l,..., [v^J}, it is clear that a,/3, and 7 in (3.10) are positive for all admissible choices 
of a. Hence, the sign of each bracketed term will be dictated by the remaining polynomial in a. It is also 
clear from their form that each polynomial is decreasing in a; hence we need only evaluate at the endpoints 
to determine positivity. But 9377 — 32(v^)^ = 29n > 0, I5n — 4(-\/^)^ = 1577 — 877 = 7/7 > 0, and 
1477^ — 3(-\/277)^77 — 14(-\/^)^ = 1477^ — 077^ — 2877 = 477(277 — 7) > 0 with the final inequality following as 
77 > 31. Hence all terms in (3.10) are positive and so Ki > 0. Together with K 2 > 0 as proved above, the 
claim is proved for 77 > 31. 

Since the bound holds for a G {1,..., } and 77 G {1,..., 30} by direct numerical computation, the 

claim follows. □ 
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4. Some comparisons and connections 
4.1- Comparisons: two-sided tail bounds 

Here we compare and contrast our results with those of Wei and Dudley [2012], As in Wei and Dudley [2012] 
(see also Wei and Dudley [2011]), we say that the DKW inequality holds for given m, n and C if 

P{Dm,n > t) < C exp(—2t^) for all t > 0, 

and we say that the DKWM inequality holds for given m, n if the inequality in the last display holds with 
C = 2. Wei and Dudley [2012] prove the following theorem: 

Theorem 4. (Wei and Dudley, 2012) For m = n in the two sample case: 

(a) The DKW inequality always holds with C = e=2.71828. 

(b) For m = n > 4, the smallest n such that He can he rejected at level 0.05, the DKW inequality holds with 
C = 2.16863. 

(c) The DKWM inequality holds for all m = n > 458. 

(d) For each m = n < 458, the DKWM inequality fails for some t of the form t = kj^/Tn. 

(e) For each m = n < 458, the DKW inequality holds for C = 2(1 + 6 n) for some > 0 where, for 

12 < n < 457, 

, 0.07 40 400 

Sn < -+ ^- 


For comparison, the following theorem follows from Theorem 3. We say that the modified DKWM 
inequality holds for given m, n if 

P{T)m,n >t)< 2exp ^—2 y ^ ^ j j for all t> 0, 


Theorem 5. For m = n in the two sample case: 

(a) For all n > 1 the modified DKWM inequality holds. 

(b) Alternatively, for the modified Kolmogorov statistic given by 


the DKWM inequality holds for all n > 1. 

We are not claiming that our “modihed” version of the DKWM inequality improves on the results of Wei 
and Dudley [2012]: it is clearly worse for m = n > 458. On the other hand, it may provide a useful clue to 
the formulation of DKWM type exponential bounds for two-sample Kolmogorov statistics when m ^ n. In 
this direction we have the following conjecture: 

Conjecture: For any n, 


P {P>m,u >t) < exp (^-2 
P {Dra,n > t) < 2 exp ( -2 


A^- 1 

N 

N -1 
N 


T 


for all t > 0 
for all t > 0 . 


(4.1) 

(4.2) 


That is, we conjecture that the modified DKWM inequality holds for all m,n > 1. This is supported by 
all the numerical experiments we have conducted so far. 
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4-2. Comparisons: one-sided tail bounds 

Wei and Dudley [2012] do not treat bounds for the one-sided statistics. Here we summarize our results with a 
theorem which parallels their Theorem 4 above. In analogy with their terminology, we say that the one-sided 
DKW inequality holds for given m, n and C if 

exp(—2t^) for all t > 0, 

and we say that the one-sided DKWM inequality holds for given m, n if the inequality in the last display 
holds with C = 1. Moreover, we say that the modified one-sided DKWM inequality holds for given m, n if 

>t)< exp (^-2 for all t > 0. 


Theorem 6. For m = n in the two sample case: 

(a) The one-sided DKW inequality holds for all n> 1 with C = el2=2.1H2Sl2 
= 1.35914. For this range of n, C = e/2 is sharp since equality occurs 

for n = 1 and t = 1/ \/2 (or a = t\/^ = !/• 

(b) For m = n > 5, the one-sided DKW inequality holds with C = 2.16863/2 = 

1.084315. 

(c) The one-sided DKWM inequality fails for all m = n > 1. 

(d) The modified one-sided DKWM inequality holds for all m = n > 1. 

Proof, (c) follows from Theorem 3-B. (d) follows from Theorem 3-A. It remains only to prove (a) and (b). 
To prove (a), we first note that Wei and Dudley [2012] showed that for n > 108 we have 

exp(—o^/n) for ylki < a < n 
(e/2) exp(—a^/n). 


/ 2n \ 
\n+a/ 


< 


< 


Thus to prove that the claimed inequality holds for n > 108, it suffices to show that it holds for < o < 
where to = \/(1/2) log(e/2) is the smallest value of t for which the bound is less than or equal to 1. 
Proceeding as in the proof of Theorem 3-A, we find that we want to show that 

log 7- W ~ -TT "I-log(e/2) < 0 for to^/n < a < 

(n-I-a)!(n — a)! n 


By the same arguments used in the proof of Theorem 3-A, we hnd that the left side in the last display is 
bounded above by 


a 

6n^ 


15n5 28n7 


a 

4n^ 


6n®(l — cF jn^') 


+ H 


+ ^ - log(e/2) 

= KxFK2. 


Now ATi < 0 for n > 4 and a e {1,..., n — 1} by the previous proof, and 

„2 

A '2 = — log(e/2) < 0 for all a < VSy/n 
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if 


|;<log(e/2), or 


=4.888.... 


This completes the proof for n > 108. Numerical computation easily shows that the claim holds for all 
n G {1,...,107}. 

The proof of (b) is similar upon replacing e/2 by 1.084315, and again computing numerically for n G 
{1,...,107}. □ 


Corollary 3. For n > 5 and C = 1.084315, 


P{D+.^>t) < min{exp (-2 (1 - l/Af)^^) , Cexp(-2t2)} 

_ f C'exp(—2t^), t>to = -\/nlogC'=.285-\/n, 

\ exp(—2(1 — 1/N)t^), t <tQ = v'nlogC'. 


Figures 1 and 2 illustrate Theorem 6. 
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Bound 

DKWM 

• Modified DKWM 

• Serfling DKWM 


Fig 1. Difference between approximations and exact one-sided probabilities P > ij for n = 128 and a G {1, 2,..., 128}. 

Negative values indicate the exact probability exceeds the approximation. Serfling DKWM is the bound obtained via the heuristic 
of section 2, using the sampling fraction 1 — /* = (W — n + 1)/A^. Modified DKWM uses the sampling fraction 1 — = 

(N — n)/{N — 1). DKWM uses the fraction from Wei and Dudley. 
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Bound 

DKWM 

• DKWMOa 

• DKWMOb 

• Modified DKWM 

• Serfiing DKWM 


Fig 2. Difference between approximations and exact one-sided probabilities P > tj for n = 23 and a £ {1, 2,..., 23}. 

Negative values indicate the exact probability exceeds the approximation. DKWM6a corresponds to the DKWM bound with the 
constant e/2, discussed in Theorem 6(a). DKWM6b corresponds to the DKWM bound with the constant 2.16863/2, discussed 
in Theorem 6(b). 
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